bayesian network learning
The Complexity of Bayesian Network Learning: Revisiting the Superstructure
We investigate the parameterized complexity of Bayesian Network Structure Learning (BNSL), a classical problem that has received significant attention in empirical but also purely theoretical studies. We follow up on previous works that have analyzed the complexity of BNSL w.r.t. the so-called superstructure of the input. While known results imply that BNSL is unlikely to be fixed-parameter tractable even when parameterized by the size of a vertex cover in the superstructure, here we show that a different kind of parameterization - notably by the size of a feedback edge set - yields fixed-parameter tractability. We proceed by showing that this result can be strengthened to a localized version of the feedback edge set, and provide corresponding lower bounds that complement previous results to provide a complexity classification of BNSL w.r.t.
The Complexity of Bayesian Network Learning: Revisiting the Superstructure
We investigate the parameterized complexity of Bayesian Network Structure Learning (BNSL), a classical problem that has received significant attention in empirical but also purely theoretical studies. We follow up on previous works that have analyzed the complexity of BNSL w.r.t. the so-called superstructure of the input. While known results imply that BNSL is unlikely to be fixed-parameter tractable even when parameterized by the size of a vertex cover in the superstructure, here we show that a different kind of parameterization - notably by the size of a feedback edge set - yields fixed-parameter tractability. We proceed by showing that this result can be strengthened to a localized version of the feedback edge set, and provide corresponding lower bounds that complement previous results to provide a complexity classification of BNSL w.r.t. In particular, while the bulk of past theoretical work on the topic assumed the use of the so-called non-zero representation, here we prove that if an additive representation can be used instead then BNSL becomes fixed-parameter tractable even under significantly milder restrictions to the superstructure, notably when parameterized by the treewidth alone.
Data Analytics with Differential Privacy
Differential privacy is the state-of-the-art definition for privacy, guaranteeing that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this thesis, we develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data, that can subsequently be used for arbitrary analyses. We build upon PrivBayes, a differentially private method that approximates the high-dimensional distribution of a centralized dataset as a product of low-order distributions, utilizing a Bayesian Network model. We examine three novel approaches to learning a global Bayesian Network from distributed data, while offering the differential privacy guarantee to all local datasets. Our work includes a detailed theoretical analysis of the distributed, differentially private entropy estimator which we use in one of our algorithms, as well as a detailed experimental evaluation, using both synthetic and real-world data. In the streaming model, we focus on the problem of estimating the density of a stream of users, which expresses the fraction of all users that actually appear in the stream. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes the internal state of the algorithm. We provide a detailed analysis of an existing, sampling-based algorithm for the problem and propose two novel modifications that significantly improve it, both theoretically and experimentally, by optimally using all the allocated "privacy budget."
- North America > United States > Wisconsin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Efficient Conversion of Bayesian Network Learning into Quadratic Unconstrained Binary Optimization
Ising machines (IMs) are a potential breakthrough in the NP-hard problem of score-based Bayesian network (BN) learning. To utilize the power of IMs, encoding of BN learning into quadratic unconstrained binary optimization (QUBO) has been proposed using up to $\mathcal{O}(N^2)$ bits, for $N$ variables in BN and $M = 2$ parents each. However, this approach is usually infeasible owing to the upper bound of IM bits when $M \geq 3$. In this paper, we propose an efficient conversion method for BN learning into QUBO with a maximum of $\sum_n (\Lambda_n - 1) + \binom N2$ bits, for $\Lambda_n$ parent set candidates each. The advance selection of parent set candidates plays an essential role in reducing the number of required bits. We also develop a pre-processing algorithm based on the capabilities of a classification and regression tree (CART), which allows us to search for parent set candidates consistent with score minimization in a realistic timeframe.Our conversion method enables us to more significantly reduce the upper bound of the required bits in comparison to an existing method, and is therefore expected to make a significant contribution to the advancement of scalable score-based BN learning.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York (0.04)
- (4 more...)
Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package
It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimisation theory, which can be adapted to the task by using the network score as the objective function to maximise. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimisation in widespread use, backtracking, leverages the symmetries implied by the definitions of neighbourhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelise constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Washington > King County > Redmond (0.04)
- (3 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Hematology (0.46)
- Education > Educational Technology > Educational Software (0.46)